Skip to content

Conversation

@tinbka
Copy link

@tinbka tinbka commented Sep 8, 2014

In Nori 2.4.0 there is a bug in a parser when it processes xml entity encoded characters
that is a common case for non-English SOAP applications.

For example, if we have the next node:

<ns0:ASSIGNEDGROUP>&#x414;&#x435;&#x436;&#x443;&#x440;&#x43D;&#x430;&#x44F; &#x441;&#x43C;&#x435;&#x43D;&#x430; &#x41B;&#x41F;1.5</ns0:ASSIGNEDGROUP>

and we want to convert it to hash using Nori with Nokogiri parser, it is expected to be:

{"ns0:ASSIGNEDGROUP"=>"Дежурная смена ЛП1.5"}

but current version of Nori converts it to:

{"ns0:ASSIGNEDGROUP"=>"ДежурнаясменаЛП1.5"}

since every whitespace in source xml lies between two other nodes and looks like a non-significant crap.

The present patch fixes this case, still not touching behaviour in any other way.

tjarratt added a commit that referenced this pull request Sep 22, 2014
Fix Nokogiri parser processing xml entity encoded characters
@tjarratt tjarratt merged commit 3a9cdb6 into savonrb:master Sep 22, 2014
@tjarratt
Copy link
Contributor

Thanks for submitting this pull request @tinbka. Your fix looks great -- my only concern is that we don't have any specs covering this. However, the fix looks rather harmless, so I'm willing to backfill tests at a later time.

Thanks for submitting to Nori!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants